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Abstract 

Search is a central problem in artificial intelligence, and BPS and DPS 
the two most fundamental ways to search. In this report we derive results 
for average BPS and DPS rnntime: Por tree search, we employ a probabilis¬ 
tic model of goal distribntion; for graph search, the analysis depends on an 
additional statistic of path redundancy and average branching factor. As 
an application, we use the results on two concrete grammar problems. The 
runtime estimates can be used to select the faster out of BPS and DPS 
for a given problem, and may form the basis for further analysis of more 
advanced search methods. Pinally, we verify our results experimentally; 
the analytical approximations come surprisingly close to empirical reality. 
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1 Introduction 


A wide range of problems in artificial intelligence can be naturally formulated 
as search problems (Russell and Norvig 2010 Edelkamp and SchrOdl} 20121. 
Examples include planning, scheduling, and combinatorial optimisation (TSP, 
graph colouring, etc.), as well as various toy problems such as Sudoku and 
the Towers of Hanoi. Search problems can be solved by exploring the space of 
possible solutions in a more or less systematic or clever order. Meta-heuristics 
are general search methods not aimed at a specific type of problem. They interact 
with the problem through abstract properties such as a neighbourhood relation 
on the space of feasible solutions, and a heuristic or objective function. One 
possible way to create flexible meta-heuristics is to combine a portfolio of search 
algorithms, and use problem features to predict which search algorithms works 
the best. Predicting the best algorithm is sometimes known as the algorithm 


selection problem {Rice 1975). 


A number of studies have approached the algorithm selection problem with 


machine learning techniques (Kotthoff 2014 Hutter et al. 2014). While demon¬ 


strably a feasible path, machine learning tend to be used as a black box, offering 
little insight into why a certain method works better on a given problem. On the 
other hand, most existing analytical results focus on worst-case big-0 analysis, 
which is often less useful than average-case analysis when selecting algorithm. 


An important worst-case result is Knuth s (1975) simple but useful technique 
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for estimating the depth-first search tree size. Kilby et al. (2006) used it for 


algorithm selection in the SAT problem. See also the extensions by Purdom 


(1978), Chen (1992), and Lelis et al. (2013). Analytical IDA* runtime predictions 


based on problem features was obtained by Korf et al. (2001) and Zahavi et al. 


(2010). In this study we focus on theoretical analysis of average runtime of BPS 
and DFS. While the IDA* results can be interpreted to give rough estimates for 
average BPS search time, no similar results are available for DPS. 

To facilitate the analysis, we use a probabilistic model of goal distribution 
and graph structure. Currently no method to automatically estimate the model 
parameters is available (this is an important line of future research). Regardless, 
the analysis still offers important theoretical insights into BPS and DPS search. 
The parameters of the model can also be interpreted as a Bayesian prior belief 
about goal distribution. A precise understanding of BPS and DPS performance 
is likely to have both practical and theoretical value: Practical, as BPS and 
DPS are both widely employed; theoretical, as BPS and DPS are two most 
fundamental ways to search, so their properties may be useful in analytical 
approaches to more advanced search algorithms as well. In particular, our results 
may be a first step to selecting search strategy based on the (local) topology of 
the problem graph, an aspect ignored by most meta-heuristics. 

Our main contribution is an analysis of expected BPS and DPS runtime 
as a function of tree depth, goal level, branching factor and path redundancy 
(SectionsWe also verify the results experimentally (Section 8). Most of 


these results will be published as (Everitt and Butter 2015a|b ). Definitions of 


different types of search problems and search algorithms are given in Section 2 


and and a review of related work can be found in Section 41 Conclusions 
and outlooks come in Section 9| and[T0l Finally, [Appendix A provides a list of 
notation, and [Appendix B and [^contain lists of search problems and potentially 
useful topological features. 


2 Graph search problems 

A common feature of many search problems is that there are a set of operations 
for cheaply modifying a proposed solution into similar proposed solutions. This 
makes it natural to view the problem as a graph search problem, where proposed 
solutions are states or nodes, and the modification operations induce directed 
edges. 

We define two kinds of graph search problems. 

Definition 1 (Constructive graph search problem). A eonstructive graph search 
problem consists of a state space S, a starting state sq S S, and the following 
efficiently computable functions: 

1. Neighbourhood N : S ^ 2^ 

2. Goal check C : S' —{0,1} 

3. Edge cost: EC : {S x S) ^ K+ 

4. Heuristic h : S —>■ K’*' 

5. Contract (tree/graph, solution depth, number of goals, admissible/consistent 
heuristic) 
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Items 1^ andare optional. A constructive solution is a path sq, ■ • ■, s„ from the 
starting state sq to a goal state s„ with C{sn) = 1- The solution quality of the 
path sq, ■ • ■, Sji is Q 

For instance, planning problems are formalised as constructive graph search 
problems. The neighbourhood function gives a list of states reachable by a single 
action from the given state. The goal check indicates whether a state is a goal, 
and the edge cost indicates how costly it is to use a certain action (how it affects 
the solution quality). A solution is a sequence of actions leading to a goal state. 

A heuristic may give an estimate of how close the given state is to a goal 
state (in terms of edge cost). An important class of heuristics are the admissible 
ones, that never overestimate the distance. Consistent heuristics additionally 
respect the triangle inequality. The properties of the heuristics may be given 
in the contract. The contract is not formalised here, but may be given as a 
dictionary of known problem properties. 

Definition 2 (Local graph search problem). A local graph search problem 
consists of a state space S together with the following efficiently computable 
functions: 

1. Neighbourhood N : S —>■ 2^ 

2. Constraint C : S' —)■ {0,1} 

3. Objective function Q : S ^ R 

4. Contract (continuity properties of the objective function) 

Itemis optional. A local solution is a state s G S, (^(s) = I, and its solution 
quality is Q{s). 

In local graph search problems, the goal is to find an s S S that satisfies 
the constraints C and achieves as high objective value as possible. An objective 
function Q is (Lipschitz) continuous with respect to the neighbourhood topology 
with Lipschitz constant d if \Q{vi) — Q{v 2 )\ < d whenever vi and V 2 are neighbours. 
Lipschitz continuity can make a problem easier, as it allows the objective values 
of surrounding nodes to be estimated from the current target value. 

The search for an optimal circuit layout is one example of a problem that 
naturally formalises as a local graph search problems. Neighbours are reached 
by modifying the current layout (changing one connection), and the objective 
function incorporates the component cost and the energy efficiency of the lay¬ 
out. The constraint disqualifies circuits that fail the specifications. Also, any 
constructive search problem Gi = {Si,Ni,Ci, EC) may be formulated as local 
search problem G 2 = {S 2 ,N 2 , (72, Q), by letting 

• S 2 be the set of paths in Gi, 

• the objective function Q be the negative sum of the path cost, 

• the constraint G 2 check whether the last node of the path is a goal node, 
and 

• the neighbourhood function N 2 extend or contract a path by adding or 
removing a final node according to A^i (better choices of N 2 may be 
available). 
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For example, the travelling salesman problem can be viewed as a constructive 
problem where a path is built step-by-step, or as a local problem where a full path 
is modihed by swapping edges, and the objective function equals the summed 
edge cost. Some potentially useful structure may be lost in the conversion from 
a constructive to a local problem. 

Although mixtures of local and constructive search problems are possible 
(e.g., combining an objective function with a constructive solution and edge 
cost), most practical graph search problems naturally formalises as either a 
constructive or a local graph search problem. 


Constraint Satisfaction Problems Search problem can also often be nat¬ 
urally cast as constraint satisfaction problems (CSPs). Blum and Roll (2003) 
suggest that this offers a unified view of many search problems. A CSP formu¬ 
lation also seems largely in line with what Pearl (|1988 1 has in mind, although 


Pearl is less specific. 


Constructive graph problems may be formulated as a CSP by encoding the 
neighbourhood relation with constraints, without significant loss of structure. (If 
the state space is infinite, an infinite number of CSP variables may be necessary.) 
For other problems, like the Quadratic Assignment Problem or Eternity iQ the 
CSP formulation is more natural and retains more structure. More structure 
makes the space of policies richer, and permits more clever strategies. It can 
also make analytic results harder, however. 


2.1 Algorithm performance 

A search algorithm is an algorithm that returns a solution (a state or a path) 
to a graph search problem, given oracle access to the functions N and C, and 
possibly either EC and /i, or Q (depending on the type of the search problem). 

Performance on a single problem may be defined in terms of: 

1. Solution quality. 

2. The number of explored states; a state s is considered explored if either 
N{s) or C{s) has been called. 

3. The running time of the algorithm. 

4. The memory consumption of the algorithm (typically measured by the 
maximum number of states kept in memory). 

To measure performance on a finite class of graph problems, average or worst- 
case performance may be used. For infinite classes, items 1 and 2 are typically 
subject to worst-case analysis (is the procedure complete/optimal), and 3 and 4 
to asymptotic worst-case or average-case analysis. 

We will focus on constructive search, and measure performance by the average 
number of explored states until in a goal is found in. In many cases the number 
of explored states is proportional to the actual runtime of the algorithm (state 
expansion is often the dominant operation during search). 

Search algorithms that always finds a goal (when there is one) are called 
complete, and algorithms that always finds an optimal solution (when there is 
one) are called optimal. 

^The Quadratic Assignment Problem is a classic combinatorial problem, and Eternity II is 
a famous puzzle competition by TOMY UK Ltd. 
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3 Basic Search Algorithms 


A plethora of search methods have been studied for both the constructive and the 
local search problems. We here review only a subset of the more important ones, 
and refer to the books (Russell and Norvig 2010 Edelkamp and Schrodl 2012) 
for more details. First some preliminaries on trees, a fundamental structure in 
search analysis. 


3.1 Trees 

A rooted tree is a (directed) graph with a root sq where every pair of nodes is 
connected by exactly one path. The level of a node v is the distance from the 
root So to t!. The depth d is the length of a longest path starting from sq. If 
every node on level less than D G N has exactly b children, and nodes on level 
D are leafs (have no children), then the tree is complete with branching factor b 
and depth D. Such a tree will have b^ leaves and — !)/(&— 1) nodes. In 

particular, complete binary trees (with branching factor 2) have 2^ leaves and 
2^+^ — 1 nodes. 

A node z; is a descendant of a node u if there is a path from u to n (i.e., if v 
is a child of a child of ... of zt). 

3.2 Uninformed search methods 

Uninformed search refers to the case where neither a heuristic function nor an 
objective function is used for guidance of the search. The two standard methods 
for exploring a graph in this case are Breadth-first Search (BPS) and Depth-first 
Search (DPS). BFS searches a successively growing neighbourhood around the 
the start node, while DFS follows a single path as long as possible, and backtracks 
when stuck. Algorithm and give pseudo-code for BFS and DFS respectively, 
and [Figure 1| shows the traversal order of BFS and DFS in a complete binary 
tree. 


Algorithm 1 Pseudo-code for BFS 
Q emtpyQueue 
Discovered ^ emptySet 
Q.add(start-node) 

Discovered. add (start-node) 
while Q not empty do 
u ^Q.popO 
if C{u) then return u 
for V in N{u) do 

if not V G Discovered then 
Q.add(n) 

Discovered.add(z;) 


DFS is substantially more memory-efhcient than BFS; 0(d) compared to 
0(b‘^). However, BFS can be emulated by an iterative deepening DPS, with the 
same memory cost as DFS and only a small penalty in runtime in most graphs 
(in graphs with exponentially growing neighbourhoods, to be precise). 
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Algorithm 2 Pseudo-code for (Recursive) DFS 
function DFS-REC(A, C, m, Discovered) 
Discovered. add(M) 
if C{u) then return u 
for V in N{u) do 

if not V G Discovered then 
DFS(N,C,v, Discovered) 
Discovered •(— emptySet 
DFS-REC(A, C, start-node) 



\ 

13 
/ \ 
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Figure 1; The difference between BFS (left) and DFS (right) in a complete 
binary tree where a goal is placed in the second position on level 2 (the third 
row). The numbers indicate traversal order. Circled nodes are explored before 
the goal is found. Note how BFS and DFS explore different parts of the tree. In 
bigger trees, this may lead to substantial differences in search performance. 


BFS and DFS come in two flavors, depending on whether they keep track 
of visited nodes or not. The tree search variants do not keep track of visited 
nodes, while the graph search variants do. In trees (where each node can only be 
reached from one path), nothing is gained by keeping track of visited nodes. In 
contrast, keeping track of visited nodes can benefit search performance greatly in 
multiply connected graphs. Keeping track of visited nodes may also be expensive 
in terms of memory consumption, however. 

One way to understand tree search behaviour in general graphs is to say that 
tree search algorithms effectively explore a tree; branches in this tree correspond 
to paths in the original graph, and copies of the same node v will appear in 
several places of the tree whenever v can be reached through several paths. DFS 
tree search may search forever if there are cycles in the graph. We always assume 
that path lengths are bounded by a constant D. 

Depending on the positions of the goals in the graph, DFS and BFS may 
have substantially different performance. In Sections IHZl below. We investigate 
some simple models of how the goal position and the graph structure affect BFS 
and DFS performance. 

3.3 Informed Constructive Methods 

Informed constructive search methods make use of a heuristic function. This 
often speeds up the search significantly. The most popular method for informed 
constructive search is A *. It may be seen as a generalisation of BFS. A* combines 
the heuristic information h{v) of a node v with the accumulated edge cost g{v) 
of the shortest found path from the start node to v. A* always expands the 
discovered node with the smallest g(v) + h{v). But when the heuristic function is 
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smaller for nodes closer to the goal, A* will prioritise more promising nodes and 
find the goal much faster than BFS. If in particular the heuristic is consistent 
(never overestimates distance and respects the triangle inequality), then A* is 
guaranteed to find an optimal solution. The tree search version of A* only 
requires the heuristic to be admissible (never overestimate search distance) for 
guaranteed optimality. 

One of the main draw-backs of A* is its memory consumption. Iterative 
deepening A * (IDA *) uses a cutoff value c that is incremented between iterations. 
In each iteration, nodes v not satisfying g{v) + h(v) < c are ignored, while the 
rest are expanded in DFS manner. The memory consumption thus becomes on 
par with DFS. However, unlike iterative deepening BFS, the runtime slowdown 
of IDA* compared to A* may be exponentially worse, since it may only be 
possible to increase the cutoff c marginally between iterations if optimality of 
the solution must be guaranteed. A range of other memory-efficient versions of 
A* can be found in the literature. 

If the edge cost is always 1 and the heuristic entirely uninformative h = 0, 
then A* reduces to BFS and IDA* to iterative-deepening DFS. 

In many cases the priority is to find a goal as fast as possible, and the goal 
does not need to be an optimal one. In these cases, a greedy best-first search 
strategy may be used, that expands nodes according to h{v) instead of g{v)-\-h{v). 
Greedy best-first may be seen as the most natural generalisation of DFS to the 
informed constructive scenario. Beam-search is a variant of greedy best-first 
that searches slightly more widely. 

3.4 Informed Local Search 

Most informed local search algorithms strive to combine an exploiting, hill¬ 
climbing component with an exploration component. The simplest one is hill¬ 
climbing, which always goes to the neighbour with the highest objective value, 
and randomly restarts when stuck. More advanced methods include simulated 
annealing, which adds a random moves to hill-climbing. The randomness com¬ 
ponent decays over time. Genetic algorithms use a population of search nodes, 
and tries to find new search points by combining features of discovered ones. 


4 Literature review 

We divide our review of related work into two subsections. The works in the 
first subsection assumes that a portfolio of predefined algorithms is given, and 
only tries to predict which algorithm in the portfolio is better for which problem. 
The second subsection reviews approaches that try to build new search policies, 
possibly using a set of basic algorithms as building blocks. 


4.1 Feature-based Meta-heuristics 

The algorithm selection problem asks what algorithm best to use on a given 


problem (Rice 1975 Kotthoff 2014). Tightly related is the question of inferring 


the search time of different search algorithms on the problem, as this information 
can be used to select the fastest algorithm. Both analytical investigations and 
machine learning techniques applied to empirical data have been tried. The latter 









is sometimes known as empirical performance models. The most comprehensive 


surveys are given by 

Hutter et al. 

(2014 

) and 

Kotthoff 

(2014 

), and the PhD 

theses Thompson ( 

20111 and 

Arbelaez Rodriguez 

(2011) 



For DFS, 

Knuth ( 

1975 

) made a simple but important observation how the 


branching factor seen during search can be used to estimate the size of the 
search tree and the runtime. Despite the simplicity of the scheme, the estimates 
work surprisingly well in practice. Several generalisations have been developed 


(Purdom 1978 Chen 19921. Kilby et al. (20061 generalise Knuth s method, and 


also use it to select search policy for the SAT problem based on which search 


policy has the lowest estimated runtime. Kullmann (20081 and Lelis et al. (2013) 


both develop estimation schemes for branch-and-bound algorithms. Haim and] 
Walsh (2008) approach the SAT problem, and instead of branching factor use 


properties of the given formula (such as the number and the size of clauses) to 
be predict search time and best search policy. 


In the case of informed search, Korf et al. (2001) developed an interesting 


analytic technique for estimating the search time of IDA*. Assuming a consistent 
heuristic function, the estimate is based on the distribution of heuristic function 
values at different depths of the search tree, rather than heuristic accuracy. 
Intuitively, the scheme works because the number of nodes expanded in each 
iteration of IDA* depends on the number of nodes with heuristic value less than 
the threshold. The distribution of heuristic values is also easy to estimate in 
practice. Zahavi et al. (2010) generalise the work of Korf et al. to non-consistent 


heuristics. 

Many other approaches instead try to directly infer the best search policy, 
without the intermediate step of estimating runtime. Fink (1998) does this 


for STRIPS-like learning using only the problem size to infer which method 
is likely to be more efhcient. Schemes using much wider ranges of problem 


properties are applied to CSPs in (Thompson 2011 Arbelaez Rodriguez 2011), 


and to the NP-complete problems SAT, TSP and Mixed integer programming 


Hutter et al. 

2014 

1 . 

Smith-Miles and Lopes 

(2012 


commonly used features for the algorithm selection problem, mainly applied to 
the local search scenario. They divide features into two main categories: General 
and problem-specific. General features usually phrased in terms of the fitness 
landscape (i.e., the target function and the neighbourhood structure). A common 
fitness landscape feature is for example the variability {ruggedness) of the target 
function. Another general feature is the performance of a simple, fast algorithm 
such as gradient descent. Problem-specific features are discussed for a range of 
NP-complete problems such as TSP and Bin-packing. 


4.2 Learning the search policy 


Explanation-based Learning (EBL) (Dejong and Mooney, 1986 Mitchell et al. 


1986 Minton 1988) is a general method for learning from examples and domain 


knowledge. In the context of search, the domain knowledge is the neighbourhood 
function (or the consequence of applying an ‘action’ to a state). An example to 
learn from can be the search trace of an optimiser. The EBL learner analyses 
the different decisions represented in the search trace, judges whether they were 
good or bad, and tries to find the reason they were good or bad. Once a reason 
has been found, the gained understanding can be used to pick similar good 
decisions at an earlier point during the next search, and to avoid similar bad 
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decisions (decisions leading to paths where no goal will be found). EBL systems 
have been applied to STRIPS-like planning scenarios ( Minton[ 1988, 19901. 

One characteristic feature of EBL is that it requires only one or a few 
training examples (in addition to the domain knowledge). While attractive, it 
can also lead to overspecific learning (Minton, [1988 1. Partial Evaluation (PE) 
is an alternative learning method that is more robust in this respect, with less 


dependency on examples (Etzioni 19931. Leckie and Zukerman (1998) develops 


a more inductive way to learn search control knowledge (in contrast to the 
deductive generalisations performed by EBL and PE), where plenty of training 
examples substitute for domain knowledge. 


A more modern approach is known as hyper heuristics (Burke et al., 2003 


2013). It views the problem of inferring good search policies more abstractly. 


Rather than interacting with the neighbourhood structure/graph problem di¬ 
rectly, the hyper heuristic only has access to a set of search policies for the 
original graph problem. The search policies are known as low-level heuristics 
in this literature (not to be confused with heuristic functions). The goal of the 
hyper heuristic is to find a good policy for when to apply which low-level heuristic. 
Eor example, Ross et al. (2002) used Genetic Algorithms to learn which bin¬ 


packing heuristic to apply in which type of state in a bin-packing problem. The 
learned hyper heuristic outperformed all the provided low-level heuristics used 
by themselves. In applications of hyper heuristics, the low-level heuristics are 
typically simple search policies provided by the human programmers, although 
nothing prevents them from being arbitrarily advanced meta-heuristics. Some 
research is also being done on automatic construction of low-level heuristics (see 
(Burke et al. 2013) for references). A related approach directed at programming 
in general is programming by optimisation (Hoos 2012), where machine learning 


techniques are used to find the best algorithm in a space of programs delineated 
by the human programmer. 


5 Complete Binary Tree 


In a search graph, the neighbourhood relation N induces a topology on the 
state space S. The following two sections analytically explore how the structure 
and depth of the graph and the distribution of the goals can be used to predict 
the search performance of BES and DES. Eigure l] gives the intuition for the 
different search strategies BES and DES, and how they initially focus the search 
on different areas of the tree. 

As a concrete example, consider the search problem of solving a Rubik’s cube. 
There is an upper bound I? = 20 to how many moves it can take to reach the 
goal (Rokicki and Kociemba |2013 ). We may however suspect that most goals 
are located around level 17 (±2 levels). If we consider search algorithms that 
do not remember where they have been, the search space becomes a complete 
tree with fixed branching factor 9. What would be the expected BES and DES 
search time for this problem? Which one would be faster? 

After some initial background in the first subsection, this section first inves¬ 


tigates a model where goals are located on a single goal level g in Section 5.2 


and then generalises it to multiple goal levels in [Section 5.3| [Section 6 develops 
techniques for analyzing the performance of the graph search variants of BES 
and DES, which recognize the path redundancies often present in problems. All 
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analytical runtime estimates are verified experimentally in [Section 8| 

5.1 Preliminaries 


For simplicity, we say that the runtime or search time of a search method (BFS 
or DFS) is the number of nodes explored until a first goal is found (5 and 6 
respectively in Figure 1|. This simplifying assumption relies on node expansion 
being the dominant operation, consuming similar time throughout the tree. If 
no goal exists, the search method will explore all nodes before halting. In this 
case, we define the runtime as the number of nodes in the search problem plus 1 
(i.e., 2^+1 in the case of a binary tree of depth d)B 

Let F be the event that a goal exists, F/j the event that a goal exists on level 
k, and f and fj. their complements. Let F/, = Fj. n (ntTcJ event that 

level k has the first goal. 

A random variable X is geometrically distributed Geo(p) if P(X = k) = 
(1 — p)^~^p for fc G {I, 2,... }. The interpretation of X is the number of trials 
until the first success when each trial succeeds with probability p. Its cumulative 
distribution function (CDF) is P{X < fc) = 1 — (I —p)^, and its average or 
expected value E[A] = 1/p. A random variable Y is truncated geometrically 
distributed X ~ TruncGeo(p, to) if F = (A | A < to) for A ^ Geo(p), which 
gives 


P{Y = k) 


for fc G {I,...,to} 
0 otherwise. 


E[Y] = E[A I A < to] = 


I — (1 — p)'^{pm + 1) 
pil - (1 -p)™) 


Let tc(p, to) denote the expected value of a truncated geometrically distributed 
variable with parameters p and to (i.e., tc(p, to) = E[y]). When p 3> 

Y is approximately Geo(p), and tc(p,TO) « -. When p <C A becomes 
approximately uniform on {I,..., to} and tc(p, m) ~ 'hfi. 

A random variable Z is exponentially distributed Exp(A) if P{Z < z) = 1 — 
g-Az £pj, 2 > 0. The expected value of Z is and the probability density function 
of Z is Xe~^^. An exponential distribution with parameter A = — ln(I —p) might 
be viewed as the continuous counterpart of a Geo(p) distribution. We will use 
this approximation in [Section 5.3| 

Lemma 3 (Exponential approximation). Let Z ~ Exp(—ln(I — p)) and X ^ 
Geo(p). Then the CDFs for X and Z agree for integers k, P{Z < fc) = P(A < k). 
The expectations of Z and X are also similar in the sense that 0 < E[A] — E[A] < 

1 . 


Proof. For z > 0, P{Z < z) = 1 — exp(zln(I — p)) = 1 — (I — pfi, and 
P{X < z) = I —(1—p)L^J. Thus, for integers fc > 0, P{Z < k) = P{X < k) which 
proves the first statement. Further, 1 —(I —p)L^J < 1 —(1—p)^ < 1 — (I —p) , 
so P(X < z) < P{Z < z) < P{X -1< z). Hence E[A] > E[Z] > E[X - 1] = 
E[A] — 1, which proves the second statement. □ 

^It may have seem more justified to set the non-goal case to the exact number of nodes 
instead of adding 1. However, adding 1 makes most expressions slightly more elegant, and 
does not affect the results in any substantial way. 
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We will occasionally make use of the convention 0 • undefined = 0, and often 
expand expectations by conditioning on disjoint events: 

Lemma 4. Let X be a random variable and let the sample space LI = IJjgjC'i be 
partitioned by mutually disjoint events Ci. Then E[X] = I ^*1- 


5.2 Complete Binary Tree with a Single Goal Level 

Consider a binary tree of depth where solutions are distributed on a single 
goal level g € {0,..., D}. At the goal level, any node is a goal with iid probability 
Pg G [0,1]. We will refer to this kind of problems as (single goal level) complete 
binary trees with depth D, goal level g and goal probability Pg (Section 5.3 
generalises the setup to multiple goal levels). 

The probability that a goal exists is P(r) = P{Tg) = 1 — (1 — pg)'^°. If a 
goal exists, let Y be the position of the first goal at level g. Conditioned on a 
goal existing, F is a truncated geometric variable Y ^ TruncGeo(pg, 2®). When 
Pg S> 2“® the goal position Y is approximately Geo(pg), which makes most 
expressions slightly more elegant. This is often a realistic assumption, since if 
p 2“®, then often no goal would exist. 


Proposition 5 (BFS runtime Single Goal Level). Let the problem be a complete 
binary tree with depth D, goal level g and goal probability pg. When a goal exists 
and has position Y on the goal level, the BFS search time is 


tsQf^{g,Pg,Y) = 2® — 1 + F, with expectation 

tiEli9,Pg I rj = 23 - 1 + tc(pg,23) « 23 - 1 + - . 

Pa 


In general, when a goal does not necessarily exist, the expected BFS search time 
is 

tslliP.Pa) = ^(r) • (23 - 1 + tc(p„ 23)) + P(f) . 2^+1 « 23 - 1 + 1. 

Pa 


The approximations are close when Pg 3> 2 3. 

Proof. When a goal exists, BFS will explore all of the top of the tree until depth 
g — 1 (that is, 2(3-i)+i = 23 nodes) and F nodes on level g, before finding 
the first goal. That is, tgQf^{D, g,pg,Y) = 2® — 1 + F, with expected value 
23-l + tc(pg,23). 

In the general case, the expected value of the search time X expands as 


E[X] = P{T) ■ E[X I F] + P(F) • E[X I f] 

= P(r) . ti^l{D,P,Pg I Tg) + Pit) ■ 2^ + 1 
= P(r) • (23 - 1 + tc(pg, 23)) + P(f) • 2^+b 

When Pg 3> 2“3^ then F « 1, f « 0 and F « Geo(p) which justifies the 
approximation. □ 


A memory-efficient tree-search variant of BFS can be implemented as iterative 
deepening DFS (ID-DFS). The runtime of ID-DFS is about twice the runtime of 
BFS; our results are only marginally affected by this. Korf et ah (2001) study 
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IDA*, which may be seen as a generalised ID-DFS. Their Theorem 1 give a 


similar result to our Proposition 5 by setting: the heuristic h = 0, the number of 
Tlevel nodes Ni = 2®, the equilibrium distribution P{x) = 1, the edge cost = 1, 
and the cost bound c equal to our max depth D. Their bound then comes out 
as tgGL(5) = — 1, and as « 2®+^ after iteration over all levels < g. 

This corresponds to the worst case in our scenario. 

Proposition 6. Consider a complete binary tree with depth D, goal level g and 
goal probability pg. When a goal exists and has position Y on the goal level, the 
DFS search time is approximately 

i^EliD,g,pg,Y) := {Y - 1)2^-^+^ 


2, with expectation 


^SGhi^j 9^Pg I Tg 


1 


:= [-1 I 2^-9+i +2. 

yPg 


When pg ^ 2 the expected DFS search time when a goal does not necessarily 
exist is approximately 

isEt{D,g,Pg) := P(r)((tc(pg,29)-l)2^-®+i+2)+P(f)2^+i«(^l-1^2^-®+i 

Proof. One way to count the nodes explored by DFS when a goal exists is the 
following. To the left of the first goal on level g, DFS will explore 2(y — 1) subtrees 
rooted at level g + 1. These subtrees will have depth D — {g + 1), and contain 
2^“® — 1 nodes each. DFS will also explore Y nodes on level g and their parents, 
which amounts to about 2Y nodes. Summing the contributions up gives the 
DFS search time approximation ^ql(-D, g,Pg, Y) = 2{Y — 1) • (2^“® ~ 1) + 2T = 
{Y - l)2^-9+i +2. 

By [Lemma 4l the expected value of the search time X expands as 
E[A] = P(r) • E[A I F] + P(f) • E[A | f] 

= P(r) . n^ll{D,g,pg, F) I F] + P(f) . 2^+1 
= P(r) • ((tc(pg, 23) - l)2^-9+^ + 2) + P(f) • 2^+1 


where the last step uses that (F | F) ^ TruncGeo(pg, 2®). When pg 3> 2 ®, then 
F « 1, f « 0 and F « Geo(pg) which justifies the approximation. □ 


Figure 2 shows the runtime estimates as a function of goal level. 


Comparison BFS vs. DFS. [Figure 6 on page 23 shows how the expected 
search time varies with goal depth (and also compares the results with the Binary 
Grammar Problem described in Section 7.1). The runtime estimates can be used 
to predict whether BFS or DFS will be faster, given the parameters D, g, and 
Pg, as stated in the next Proposition. 


Proposition 7. Let = log 2 (tc(pg, 2®) - 1) /2 « log 2 /2- Given the 

approximation of DFS runtime of \Proposition BFS wins in expectation in a 
complete binary tree with depth D, goal level g and goal probability pg when 


D 

9 <G^+lp, 


and DFS wins in expectation when g > ^ F Ipg + \ ■ 
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Decision Boundary 


Expected Search Time 




Figure 2: Two plots of how expected BFS and DFS search time varies in complete 
binary tree with a single goal level g and goal probability Pg = 0.07. The left 
depicts search time as a function of goal level in a tree of depth 15. BFS has the 
advantage when the goal is in the higher regions of the graph, although at first 
the probability that no goal exists heavily influences both BFS and DFS search 
time. DFS search time improves as the goal moves downwards since the goal 
probability is held constant. The right graph shows the decision boundary of 
|Proposition 7[ together with 100 empirical outcomes of BFS and DFS search 
time according to the varied parameters g G [3, D] n N and D G [4,15] D N. The 
decision boundary gets 79% of the winners correct. 


case 


The term is in the range [—1,1] when pg G [0.2,0.75], 5 > 2, in which 
Proposition 7 roughly says that BFS wins (in expectation) when the goal 


level g is located higher than the middle of the tree. For smaller pg, BFS benefits 
with the bounda ry level b eing shifted 7 ^^ ^ k/2 levels from the middle when 


Pg 


-)-k 


» 2 - 


Figure 2 illustrates the prediction as a function of goal depth 


and tree depth for a fixed probability Pg = 0.07. 


Proof of \Proposition ?[ When no goal exists, BFS and DFS will perform the 
same. When the tree contains at least one goal node, BFS will found the goal 
somewhere on its sweep across level g, so the BFS runtime is bounded between 
< iiGL(5,P9) < 2»+i. 

The upper bound for (g, pn) gives that Pa) < inFsiD, q, Pn) when 

25+1 ^ {tc{j)g, 25) - 1) 2 II- 5 + 1 . Taking the binary logarithm of both sides yields 


g + 1 < log 2 {tc{pg, 25) - 1) + D - g + 1. 


Collecting the g’s on one side and dividing by 2 gives the desired bound 

, log2(tc(pg,25) -1) D D 
9 < -— -1-— + ^ ^ + 7pg • 

Similar calculations with the lower bound for tgQf^{g,pg) gives the condition 
for tsGL (D, g,Pg) < ( 5 , Pg) when 5 > f + 7 ps + i □ 

It is straightforward to generalise the calculations to arbitrary branching 
factor 6, by just substituting the 2 in the base of tgQL and tgc® for b. In 
the change only affects the base of the logarithm in . 


Proposition 7 
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Corollary 8. Given the above approximations to BFS and DFS runtime, BFS 
wins in expectation in a complete tree with integer branching factor b > 2, depth 
D, goal level g, and goal probability pg when g < ^ + 'yb,pg, and DFS wins in 
expectation when g > ^ + Jb,pg + where 'yb,pg = logf, (tc(pg,6®) — 1) /2 « 
log.(^)/2. 

5.3 Complete Binary Tree with Multiple Goal Levels 

We now generalise the model developed in the previous section to problems that 
can have goals on any number of levels. For each level A: G {0,..., D}, let pk be 
the associated goal probability. Not every p^ should be equal to 0. Nodes on level 
k have iid probability pk of being a goal. We will refer to this kind of problems 
as (multi goal level) complete binary trees with depth D and goal probabilities p. 


5.3.1 DFS Analysis 

Our approximation of DFS performance in the case of multiple goal levels 
approximates the geometric distribution used in Proposition 6| with an exponential 
distribution (its continuous approximation by Lemma 3). 

Proposition 9 (Expected Multi Goal Level DFS Performance). Consider a com¬ 
plete binary tree of depth D with goal probabilities p = [poj ■ ■ • ,Pd] £ [0, 

If for at least one j, Pj ^ 2“-^ , and for all k, pk ^ 1, then the expected number 
of nodes DFS will search is approximately 


D 


^mcKAp) := l/5]ln(l-pfe)-i2-(^-'=+i). 


fc=0 


The proof constructs for each level k an exponential random variable Xk that 
approximates the search time before a goal is found on level k (disregarding goals 
on other levels). The minimum of all Xk then becomes an approximation of the 
search time to find a goal on any level. The approximations use exponential 
variables for easy minimisation. 


Proof. The proof uses two approximations. First approximate the position of 
the first goal on level k with Yk ~ Exp(Afc), where Xk = — ln(l ~ Pk)- This is 
reasonable for the following reason. For pk » 2“'^, Yk is approximately Geo{pk), 
so the approximation is justifiable by [Lemma 3| For smaller pk, the probability 
that this level has a goal is small, so the imprecision of the approximation does 
not affect the result significantly (as long as not all pj are this small). 

Second, disregarding goals on levels other than k, the total number of nodes 
that DFS needs to search before reaching a goal on level k is approximately 
Xk ~ Exp(Afc 2 “T-fc+i) This follows from an approximation of Proposition 6 


The number of nodes DFS needs to search to find a goal on level k is 


k,Pk,Yk) 


(Yk - 1)2^-^+^ + 2 « Ffc • 


(This is a reasonable estimate if Yk is large, which is likely given that pfc 1 
by assumption.) So Xk is approximately a multiple of Y^. For any 

exponential random variable Z with parameter A, the scaled variable m ■ Z is 
Exp(A/m). This completes the justification of the second approximation. 
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The result now follows by a standard minimisation of exponential variables. 
Since approximates the number of nodes searched before finding a goal 
on level k, the number of nodes searched before finding a goal on any level is 
X = minfe Xk. The CDF for X is 


D 

P{X<y) = l-l[ P{Xu > y) 

= 1 - exp(-Afe2“(^“''+^)j/) 

= l-exp(-y^Afc2"(^"'=+^)). 
k=0 

(The minimum of exponential variables ~ Exp(^fe) is again an exponential 
variable Exp(^^fc).) 

SoX ^ Exp(^^^g Afc2“(^“^+^^)) with the claimed expected value l/X]^=o 

□ 


In the special case of a single goal level, the approximation of Proposition 9 


is similar to the one given by [Proposition 6l When p only has a single element 
Pj ^ 0, the expression simplifies to 


^FS 

'^MGL 


(D,p) = l2^-^+i = 

^3 


i- oD-i + l 

Mi-pj) 


Eor Pj not close to 1, the factor — l/ln(l — Pj) is approximately the same as 
the corresponding factor 1/p^- — 1 in Proposition 6 (the Laurent expansion is 

-l/ln(l - Pj) = 1/Pj - 1/2+ Oipj)). 


5.3.2 BFS Analysis 

The corresponding expected search time tM^L(^> P) ^FS requires less insight 
and can be calculated exactly by conditioning on which level the first goal is. 
The resulting formula is less elegant, however. The same technique cannot be 
used for DFS, since DFS does not exhaust levels one by one. 

The probability that level k has the first goal is P{Fk) = P{^k) 11^=0 

where P{Ti) = (1 — (1 — PiY ). The expected BFS search time gets a more 
uniform expression by the introduction of an extra hypothetical level D + \ where 
all nodes are goals. That is, level D + 1 has goal probability po+i = 1 and 
P{FD+i)=P{f) = l-Y.k=oP{Fk). 

Proposition 10 (Expected Multi Goal Level BES Performance). The expeeted 
number of nodes (p) PP^ needs to search to find a goal in a complete 
binary tree of depth D with goal probabilities p = [po, ... ,Pd], P 7 ^ 0, is 

D-\-l D-\-l / ^ 

t^GLiP) = E P(.Fk)t^El(k,Pk I Tfc) « E 2^= + - 

fc=0 k=0 ^ 

Eor pk = 0, the expression t^^^{k,pk) and 1/pfc will be undefined, but this 
only occurs when P{Fk) is also 0. 
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Figure 3: The difference between BFS (left) and DFS (right) in multiply con¬ 
nected graphs. Note how DFS is additionally concentrated at the bottom of the 


graph compared to Figure 1 


Proof. To BFS, the event that level k has a goal is equivalent to the single 
goal level model of Section 5.2 Let X be BFS search time, and let (X | Fk) be 


the number of nodes that BFS needs to search when k is the first level with a goal. 
Then (X I Fk) = t^^l{k,Pk,X - {2^ - 1) | Ffc), and E[X | Fk] =^l{k,p k \ Ffc). 
The result follows by expanding E[X] over Fq, ..., Fd+i as in [LemmaX 


□ 


The approximation tends to be within a factor 2 of the correct expression, 
even when pk < 2~^ for some or all pk & p. The reason is that the corresponding 
P^Fkfs are small when the geometric approximation is inaccurate. 

Both [Proposition 9| and naturally generalise to arbitrary branching factor 
b. Although their combination does not yield a similarly elegant expression as 
[Proposition 7[ th ey can still be naively combined to predict the BFS vs. DFS 
winner (Figure 8). 


6 Colliding Branches 

The last section predicted runtime of tree seareh algorithms that do not remember 
which nodes they visit, which means that the search graph always has shape of 
a tree (with the same node possibly occurring in several places). In this section, 
we explore the performance of graph search algorithms that avoid revisiting 
previously explored nodes by keeping track of which nodes have already been 
seen. [Figure 3] gives an idea of the difference between BFS and DFS in multiply 
connected graphs (with bounded search depth). 

Definition 11. For a given search problem: Let the level of a node u, level(r!), 
be the length of a shortest path from the start node to v. Let D = max„ level(u) 
be the (generalised) depth of the search graph. Let Sn be the first node on level 
n reached by DFS, 0 < n < D. 

The descendant counter L plays a central role in the analysis. For a given 
search problem, let 

L{n,d) = [{f; : level(z;) = d,v G descendants(^„)}| 
count the number of nodes on level d that are reachable from Sn- 

As in the previous section, we assume that goals are distributed by level in an 
iid manner according to a goal probability vector p. We will also assume that the 
probability of DFS finding a goal before finding Sn is negligible. We will refer to 
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this kind of problems as search problems with depth D, goal probabilities p and 
descendant eounter L. The rest of this section justifies the following proposition. 

Proposition 12. The DFS and BFS runtime of a search problem can be roughly 
estimated from the descendant counter L, the depth D and the goal probabilities 
p = [po,... ,pd] when the probability of finding a goal before So is negligible. 

The assumption of DFS not finding a goal before (5 _d is not always realistic, 
but is for example satisfied in the grammar problems considered in |Section~7| 
below. 

6.1 DFS Analysis 

The nodes 6o,... ,Sd play a central role in the analysis of DFS runtime, since all 
the descendants of 5n+i will be explored before the descendants of 5n (excluding 
the 5n+i descendants). We say that DFS explores from 6n after DFS has explored 
all descendants of 5n+i and until all descendants of have been explored. The 
general idea of the DFS analysis will be to count the number of nodes under 
each 5m and to compute the probability that any of these nodes is a goal. 

Some notation for this: 

• Let the 5n-subgraph Sn = {v : v G descendants(i5„)} be the set of nodes 
reachable from 5m with cardinality |S'n| = X]i=o*)» 0 < n < D. 
Let Sd +1 = 0 and let be a set of cardinality |S'_i| = |S'ol + 1 = 

Ef=oi(o,*) + i- 

• Let the 5n-explorables Tn = Sn \ S„+i be the nodes explored from (5„. 

• Let the number of leveTd 5n-explorables An^d = L{n, d) — L(n + 1, d) be 
the number of level d descendants of 5^ that are not descendants of 

for 0 < n,d < D. The relation between and An.d is the following: 

\'^ri\ = '^i=n 

Let Qk = ^ — Pk ior 0 < k < D. 

Lemma 13. Consider a search problem with depth D, goal probabilities p, and 
descendant counter L. The probability that the 5n-explorables Tn contains a goal 
is Tn := I - Uk=0 and the probability that Tn contains the first goal is 

4>n '■= Tn ni=n+l(^ ~ D) ■ 

Proof. Tn is 1 minus the probability of not hitting a goal at any level d, n < d < D, 
since at each level d, An^d probes are made when exploring from (5„. □ 

Proposition 14 (Colliding branches expected DFS search time). The expected 
DFS search time tgg®(D, p, L) in a search problem with depth D, goal probabilities 
p, and descendant counter L is bounded by 

D D 

t^ll{D,p,L):= Y, |^n+lkn<tgI®(D,p,L)< ^ \Sn\cl>n.= t^ll{D,p,l) 

n——l n——l 

where = f = 1 — probability that no goal exists. 
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The arithmetic mean i^^^{D,p, L) := (t§ 3 L(^)Pi^) + ^cbu(-^’P’ 
between the bounds can be used for a single runtime estimate. 

Proof. Let X be the DFS search time in a search problem with the features 
described above. The expectation of X may be decomposed as 

D 

E[X] = P(f)E[X I f] + ^ P(first goal in T„) • E[X | first goal in r„]. (1) 

n—0 

The conditional search time {X \ first goal in T„) is bounded by liSn+i l<(^l 
first goal in Tn) < |5'n| for 0 < n < D, since to find a goal DFS will search the 
entire (5„+i-subgraph Sn+i before finding it when searching the (5„-explorables 
Tn, but will not need to search more than the (5„-subgraph = 5'„+i U T„ 
(disregarding the few probes made ‘on the way down to’ (i.e. to T„); these 
probes were assumed negligible). The same bounds also hold with Sq and S-i 
when no goal exists (recall that |S'_i| := jiSol + 1). Therefore the conditional 
expectation satisfies 

|S'„+i| < E[X I first goal in Tn] < |5'„| (2) 

for — 1 < n < D. By [Lemma 13[ the probability that the first goal is among the 
<5„-explorables Tn is (fn, and the probability PiT) that no goal exists is 4>-i by 
definition. 

Substituting and ([^ into Q gives the desired bounds for expected DFS 
search time p, L) = E[X]. □ 

The informativeness of the bounds of jProposition 14| depends on the dispersion 
of nodes between the different T„’s. If most nodes belong to one or a few sets 
Tn, the bounds may be almost completely uninformative. This happens in the 
special case of complete trees with branching factor b, where a fraction {b— l)/b 
of the nodes will be in Tq. The previous section derives techniques for these 
cases. The grammar problems investigated in [Section 7| below show that the 
bounds may be relevant in more connected graphs, however. 

6.2 BFS Analysis 

The analysis of BFS only requires the descendant counter L(0, •) with the first 
argument set to 0, and follows the same structure as [Section 5.3.2[ In contrast to 
the DFS bounds above, this analysis gives a precise expression for the expected 
runtime. The idea is to count the number of nodes in the upper k levels of 
the tree (derived from L(0,0 ),..., L(0, k)), and to compute the probability that 
they contain a goal. Let the upper subgraph Uk = ^(Oi*) b® the number 

of nodes above level k When there is only a single goal level, [Proposition 5[ 
naturally generalises to the more general setting of this section. 

Lemma 15 (BFS runtime Single Goal Level). For a search problem with depth 
D and descendant counter L, assume that the problem has a single goal level g 
with goal probability Pg, and that pj = 0 for j ^ g. When a goal exists and has 
position Y on the goal level, the BFS search time is: 

tQ^{g,Pg, L, Y) = Ug + Y, with expeeted value 
tcWi9,Pg,L I Fg) = Ug+tc{pg,L{0,g)) 
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Proof. When a goal exists, BFS will explore all of the top of the tree until depth 
g — 1 (that is, Ug nodes) and Y nodes on level g before finding the first goal. 
The expected value of Y is tc{pg, L(0, g)). □ 

The probability that level k has a goal is PiPk) = 1 — and the 

probability that level k has the first goal is P{Fk) = PiTk) 11 ^=0^ -P(f’i)- By the 
same argument as in |Proposition 1^ the following proposition holds. 

Proposition 16 (Branch Colliding Expected BFS Performance). The expected 
number of nodes that BFS needs to search to find a goal in a search problem with 
depth D, goal probabilities p = [po; • ■ • iPd], P 0, and deseendant eounter L is 


D+l 

tggS(p,L) = ^ P{Fk)tll^{k,pk,L I Pfc) 


fc =0 


where the goal probabilities have been extended with an extra element po+i = 1, 
and Eb+i = P is the event that no goal exists. 


For pk = 0, fgg will be undefined, but this only occurs when P{Fk) is 
also 0. Proposition 14 and 16 give (rough) estimates of average BFS and DFS 
graph search time given the goal distribution p and the structure parameter L. 
The results can be combined to make a decision whether to use BFS or DFS 
(Figure 5). 


7 Grammar Problems 

We now show how to apply the general theory of [Section 6| to two concrete 
grammar problems. A grammar problem is a constructive search problem where 
nodes are strings over some finite alphabet B, and the neighbourhood relation 
is given by a set of production rules. Production rules are mappings x —>■ j/, 
x,y G B*, defining how strings may be transformed. For example, the production 
rule S —> Sa permits the string aSa to be transformed into aSaa. A grammar 
problem is defined by a set of production rules, together with a starting string 
and a set of goal strings. A solution is a sequence of production rule applications 
that transforms the starting string into a goal string. Many search problems 
can be formulated as grammar problems, with string representations of states 
modified by production rules. Their generality makes it computably undeeidable 
whether a given grammar problem has a solution or not. We here consider a 
simplified version where the search depth is artificially limited, and goals are 
distributed according to a goal probability vector p. 

Grammar problems exhibit two features not present in the complete tree 
model. First, it is possible for branches of the grammar tree to ‘die’. This 
happens if no production rule is applicable to the string of the state. Second, 
often the same string can be produced by different sequences, which means that 
the grammar search graph in general is not a tree. The following subsections 
apply the theory of [Section 6| of colliding branches to simple grammar problems. 

7.1 Binary Grammar 

Let e be the empty string. The binary grammar consists of two production rules, 
e —>■ a and e —> 6 over the alphabet B = {a, b}. The starting string is the empty 
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Figure 4: Graph of binary grammar problem with max depth £) = 3. Contiguous 
lines indicate first discovery by DFS, and dashed lines indicate rediscoveries. 


string e. A maximum depth D of the search graph is imposed, and strings on 
level k are goals with iid probability Pk, 0 ^ k < D. Since the left hand substring 
of both production rules is the empty string, both can always be applied at any 


place to a given string. The resulting graph is shown in Figure 4 


Consider a node v at level d. Its children are reached by either adding an a 
or by adding a b. Let denote the number of a’s in u, and let #6 denote the 
number of &’s in v. Then #a + 1 distinct strings can be created by adding a b, 
and #5 + 1 distinct strings can be created by adding an a. In total then, v will 
have (#a + 1) + (#& + 1) = d + 2 children. Nodes further to the right will have 
more of their children previously discovered. The number of parents of a node is 
the number of contiguous a* and V segments. For example, bbaaab have three 
segments bb-aaa-b and three parents baaab, bbaab and bbaaa. A parent always 
differs from a child by the removal of one letter from one segment, and within a 
segment it is irrelevant which letter is removed. 

The first node on level n that DFS reaches in the binary grammar problem 
is 5n = a" for 0 < n < D, assuming that the production rule e —>■ a is always 
used first by DFS. The following lemma derives an expression for the descendant 
counter L®® required by Proposition 14 Incidentally, the number of level-d 
Sn explorables An^d (Section 6.1) gets an elegant form in the binary grammar 
problem. 

Lemma 17. Forn < d, let L^‘^{n,d) = |{u : level(r;) = d,v G descendants(a")}| 
be the number of nodes reachable from , and let An(n,d)—L^^ {n-\-l,d) 
be the number of descendants of that are not descendants of oA'^^. Then 

L^^{n,d)=j:to{t)> o,ndAn,d={dtn)- 

Proof. The reachable nodes on level d that we wish to count are d—n levels below 
a". To reach this level we must add i < d — n number of 6’s and d — n — i number 
of a’s to a". The number of length d strings containing exactly i number of 6’s 
is (we are choosing positions for the 6’s non-uniquely with repetition among 
d — i + 1 possible positions). Summing over i, we obtain L^^{n, d) = (i)’ 

and An,d = L^^{n,d) - L^^{n+ l,d) = □ 

Corollary 18 (Expected Binary Grammar BFS Search Time). The expected 
BFS search time tBG^(p) in a Binary Grammar Problem of depth D with goal 
probabilities p = [po,... ,pd] is 


iBFS 


(P) = tcWiP: 
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Decision Boundary 



Figure 5; The decision boundary predicted by Corollary 18 and 19 together with 
empirical outcomes of BFS and DFS search time. The scattered points are based 
on 100 independently generated binary grammar problems of depth D = lA with 
uniformly sampled (single) goal level g S [8,14] CN and log(pg) G [—4,0]. DFS 
benefits from a deeper goal level and higher goal probability compared to BFS. 
The decision boundary gets 87% of the instances correct. 


Corollary 19 (Expected Binary Grammar DFS Search Time). The expected 
DFS search time t]^Q®(D,p) in a binary grammar problem of depth D with goal 
probabilities p = [po, ■ • ■ ,Pd] is bounded between •= ^cbl(^!P i 

and tBQu(D, p) := t[?gu(D, p, L®®), and is approximately 


Proof of \Corollary Idj and \lS\ Direct application of Lemma 17 and Proposi- 
jtion 16] and 14 respectively. □ 

The bounds are plotted for a single goal level in [Figure 5| and [^ 


7.2 Random Grammar 

The Random Grammar Problem has alphabet B = {S', a, 6} and start string S, 
The production rules always include S —> e (with e denoting the empty string) 
plus a random subset of the adding rules S —)■ Sa, S —>■ Sb, S —)■ aS, S —)■ bS, 
and a random subset of the moving rules Sa —>■ aS, Sb —>■ bS, aS —)■ Sa, and 
bS —> Sb. Only strings containing no S can be goal nodes. As usual, a maximum 
depth D and a goal probability vector p = [pq, ... ,pd] are given. 

For simplified analysis, we will abuse notation the following way. We will 
consider S-less nodes to be one level higher than they actually are. For example, 
we will consider a to be on level 1, although it is technically on level 2 or lower 
(e.g. reached by the path S —>■ Sa, S — e). A slight modification of BFS and 
DFS makes them always check the S-less child first (which is always child-less in 
turn), which means the change will only slightly affect search time. We will still 
consider (5„ = Sa" whenever S —>■ Sa is among the production rules, however. 
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Complete Binary Tree 


Binary Grammar 




Figure 6: The expected search time of BFS and DFS as a function of a single goal 
level g with goal probability pg = 0.05 in a tree of depth D = 20. BFS has the 
advantage when the goal is in the higher regions of the graph, although at first the 
probability that no goal exists heavily influences both BFS and DFS search time. 
The greater connectivity of the graph in the binary grammar problem permits 
DFS to spend more time in the lower regions before backtracking, compared 
to the complete binary tree analysed in the previous section. This penalises 
DFS runtime when the goal is not in the very lowest regions of the tree. BFS 
behaviour is identical in both models. 


The general case of when a random set of production rules are used is explored 
experimentally in [Section 9[ The special case of a binary tree arises when none 
of the moving rules are used, and either only the first two or only the last two of 
the addition rules. The analysis of [Section 5. 2| applies to this case. The special 
case when all rules are present can be analysed analytically by the means of 
Section 6 We will call this case the full grammar problem. 


7.3 Full Grammar 


The search graph of the full grammar problem is shown in Figure 7 (edges 


induced by moving rules are not shown). Since there are four adding rules that 
can be applied to each node, each node will have four children. Typically, when 
we move further to the right in the tree, more children will already have been 
discovered. 

The full grammar problem can be analysed by a reduction to a binary 
grammar problem with the same parameters D and p. Assign to each string 
V of the binary grammar problem the set of strings that only differ from v by 
(at most) an extra S. We call such sets node clusters. For example, {a, ^a, aS”} 
constitutes the node cluster corresponding to a. Due to the abusing of levels for 
the S-less strings, all members of a cluster appear on the same level in the full 
grammar problem (the level is equal to the number of a’s and 5’s). The level is 
also the same as the corresponding string in the binary grammar problem. 


Lemma 20 (Binary Grammar Reduction). For every n, d, n < d, the descendant 
counter of the full grammar problem is L^^{n, d) = {d + 2)L®*^(n, d). 
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Figure 7: Search graph for the Grammar problem until level 2. Connections 
induced by moving rules are not displayed. Contiguous lines indicate the first 
discovery of a child by DFS and dashed lines indicate rediscoveries. 


Proof. L^‘^{n,d) counts the level d descendants of a" in the binary grammar 
problem (BGP), and L^^{n, d) counts the level d descendants of S'a" in the full 
grammar problem (FGP). The node m is a child of v in BGP iff the members of 
the u node cluster are descendants of Su. Therefore the node clusters on level d 
descending from Sa^ in FGP correspond to the BGP nodes descending from a". 
At level d, each node cluster contains d + 2 nodes. □ 

Corollary 21 (Expected Full Grammar BFS Search Time). The expected BPS 
search time ^g®(p) in a full grammar problem of depth D with goal probabilities 
P = bo, ■ • ■ ,Pd] is 

troHp) :=t^lHp,L^^)- 

Corollary 22 (Expected Full Grammar DFS Search Time). The expected 
DFS search time tf^^{D,p) in a full grammar problem of depth D with goal 
probabilities p = [po, ■ • ■ ,Pd] *5 bounded between tpQf^{D, p) := tcBL(-^, P, 
and tpGu(D,p) := (D, p, L^*^), and is approximately 

i^EHD,p) :=i^^HD,P,L^^). 

Proof of \Corollary 21\ and\2‘^ Direct application of [Lemma 20[ and |Proposi-| 
|tion 16| and respectively. □ 


8 Experimental verification 


To verify the analytical results, we have implemented the models Sections HHZl 
in Python 3 using the graph-tool package (Peixoto 20151. The data reported 
in Tables m is based on an average of 1000 independently generated search 
problems with depth D = 14. 


• The first number in each box is the empirical average, 

• the second number is the analytical estimate, and 

• the third number is the percentage error of the analytical estimate. 

For certain parameter settings, there is only a small chance (< 10“^) that 
there are no goals. In such circumstances, all 1000 generated search graphs 
typically inhabit a goal, and so the empirical search times will be comparatively 
small. However, since a tree of depth 14 has about 2^® « 3 • 10® nodes (and 
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Figure 8: The decision boundary for the Gaussian tree given by 
andflOl together with empirical outcomes of BFS vs. DFS winner, 
points are based on 100 independently generated problems with depth I? = 14 
and uniformly sampled parameters /r G [5,14] nN and log((T^) G [—2,2]. The 
most deciding feature is the goal peak /i, but DFS also benefits from a smaller 
(T^. The decision boundary gets 74% of the winners correct. 


Proposition 9 
The scattered 


a search algorithm must search through all of them in case there is no goal), 
the rarely occurring event of no goal can still influence the expected search time 
substantially. To avoid this sampling problem, we have ubiquitously discarded 
all instances where no goal is present, and compared the resulting averages to 
the analytical expectations conditioned on at least one goal being present. 

To develop a concrete instance of the multi goal level model we consider the 
special case of Gaussian goal probability vectors, with two parameters p, and a^. 
For a given depth D, the goal probabilities are given by 


Pi = min 


20v^ ’2/- 


The parameter p G [0, D] n N is the goal peak, and the parameter tr^ G K"*" is 
the goal spread. The factor 1/20 is arbitrary, and chosen to give an interesting 
dynamics between searching depth-first and breadth-first. No Pi should be 


greater than 1/2, in order to (roughly) satisfy the assumption of Proposition 10 
We call this model the Gaussian binary tree. 


Complete Tree The accuracy of the predictions of Proposition 5 and are 
shown in Table 1[ and the accuracy of [Proposition 9 and jlOj in [Table 2| The 
relative error is always small for BFS (< 10%). For DFS the error is generally 
within 20%, except when the search time is small (< 35 probes), in which case 
the absolute error is always small. The decision boundary of [Proposition 7|is 
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(a) BFS single goal level 


9\Pg 

0.001 

0.01 

0.1 
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15 000 
2.2% 
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1.9% 

8 

14530 
15 620 
7.5% 
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9967 

1.4% 
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1154 

4.5% 

11 

11200 

11140 

0.5% 

1535 

1586 

3.4% 

152.3 

146.0 

4.1% 

14 

1971 

2000 

1.4% 

208.8 

200.0 

4.2% 

30.57 

20.00 

35% 


(b) DFS single goal level 


Table 1: BFS and DFS 
performance in the sin¬ 
gle goal level model 
with depth D = 14, 
where g is the goal 
level and Pg the goal 
probability. Each box 
contains empirical aver¬ 
age/analytical expecta¬ 
tion/error percentage. 
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(a) BFS multi goal level 


g\a 
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5949 
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9.8% 

1234 

1259 

2.1% 

454.6 

473.6 
4.2% 

252.6 

260.0 

2.9% 

11 

97.38 

92.95 

4.5% 

168.1 

157.4 

6.4% 

117.4 

106.7 

9.1% 

210.0 

211.7 

0.8% 

14 

24.00 

11.62 

52% 

43.38 

32.89 

24% 

81.75 

74.46 

8.9% 

213.6 

205.0 

4.0% 


(b) DFS multi goal level 


Table 2; BFS and 
DFS performance 
in Gaussian binary 
trees with depth 
D = 14. Each box 
contains empirical 
average/analytical 
expectation / error 
percentage. 


shown in Figure 2\ and the decision boundary of [Proposition 9| vs.[T0|is shown 
in jFigur^ These boundary plots show that the analysis generally predict the 
correct BFS vs. DFS winner. 


Grammar The binary grammar model of [Section 7.1| serves to verify the 
general estimates of [Proposition 14 and 16 The results are shown in [Table~3| 
The estimates for BFS are accurate as usual (< 3% error). With few exceptions, 


the lower and the upper bounds <bgl ^bgu of |Corollary 19[ for DFS differ 
by at most 50% on the respective sides from the true (empirical) average.The 
arithmetic mean often give surprisingly accurate predictions (< 4%) except 
when tgGL ^BGU leave wide margins as to the expected search time (when 
g = 14, the margin is up to 84% downwards and 125% upwards). Even then, 
the error remains within 30%. 


9 Empirical Predictions 

The Random Grammar model of [Section 7. 2[ exhibits a rich variety of topological 
features such as properties of the branching factor distribution. It is an interesting 
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(a) BFS 
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(b) Average DFS 
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(c) Lower DFS till 


(d) Upper DFS tiH 


Table 3: BFS and DFS performance in binary grammars of depth D = 14. 
Empirical DFS performance is compared to the upper and lower bounds of 
Corollary as well as their arithmetic average. In these experiments, goals are 
distributed on a single goal level g with goal probability Pg. The BFS estimates 
tgQ® are highly accurate, and the averaged DFS estimates are mostly 
accurate. Each box contains empirical average/analytical expectation/error 
percentage. 


28 





































question to what extent such features can be used to predict whether BPS or 
DPS is the better search method. However, it is harder to approach analytically, 
due to its many cases not the least. We therefore approached this problem 
empirically. 

We generated a data set with 1827 randomly sampled random grammars. 
The sampling was done uniformly from the following sets: First sample a number 
of rules r G [4, 8 ] fl N, then a random size r subset of the 8 possible production 
rules. Also sample maximum depth D G [11,15] D N. Sample a number of goals 
n G [3, bD] n N. Sample n times a level fc G [1, £>] n N and a node on level k; 
make the sampled node a goal. 

We trained a Support Vector Machine from the scikit-learn package 


(Pedregosa et al. 2011) with a (Gaussian) Radial Basis Kernel to predict whether 
BPS or DPS would find a first goal faster. The features we predicted from were 


• Mean branching factor 

• Standard deviation of branching factor 


Number of rules 


• Maximum depth 

The best parameter settings for the support vector machine were C = 1000, 
degree = 3, 7 = 0.1. Against a cross-validation data set, the trained support 
vector classifier got the BFS/DFS winner correct in 63% of the cases, in a dataset 
where DPS won 55% of the time. The results give an early indication that it may 
be possible to predict the best search method based solely on locally estimable 
features of the search graph. 


10 Discussion 


Search and optimisation problems appears in different flavors throughout the 
field of artificial intelligence; in planning, problem solving, games, and learning. 
Therefore even minor improvements to search performance can potentially lead 
to gains in many aspects of intelligent systems. It is even possible to equate 


intelligence with (Bayesian expectimax) optimisation performance (Legg and 


Hutter, 2007). 


Summary. In this report we have derived analytical results for expected 
runtime performance. [Section*^ focused on BPS and DPS tree search where 
explored nodes were not remembered. A vector p = (pi,... ,pd) described a 
priori goal probabilities for the different levels of the tree. This concrete but 
general model of goal distribution allowed us to calculate approximate closed- 
form expression of both BPS and DFS average runtime. Earlier studies have 


only addressed worst-case runtimes: Knuth (1975) and followers for DFS; Korf 


et al. (2001) and followers for IDA*, effectively a generalised version of BPS. 

Section 6 and generalised the model of [Section 5| to non-tree graphs. In 
addition to the goal probability vector p, the graph search analysis required 
additional structural information in the form of a descendant counter L. The 
graph search estimates also took the form of less precise bounds. The analysis 
of [Section 6 does not supersede the analysis in Section 5 as the bounds of 
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|Section~6| become uninformative when the graph is a tree. The results are 
generally consistent with empirical reality. 


Conclusions and Outlook. The value of the results are at least twofold. 
They offer a concrete means of deciding between BPS and DPS given some 
rough idea of the location of the goal (and the graph structure). To make the 
results more generally usable, automatic inference of model parameters would 
be necessary; primarily of goal distribution p and graph structure L. (The depth 
D will often be set by the searcher itself, and perhaps be iteratively increased.) 
There is good hope that the descendant counter L can be estimated online from 
the local sample obtained during search, similar to (Knuth, 1975). The goal 
distribution is likely to prove more challenging, but resembles the automatic 
creation of heuristic functions, so techniques such as relaxed problems could well 
prove useful (Pearl 1984). Estimates of goal distribution could possible also be 
inferred from a heuristic function. 

The results also offer theoretical insight into BPS and DPS performance. As 
BPS and DPS are in a sense the most fundamental search operations, we have 
high hopes that our results and techniques will prove useful as building blocks 
for analysis of more advanced search algorithms as well. Por example. A* and 
IDA* may be viewed as a generalisations of BPS, and Beam Search and Greedy 
Best-Pirst as generalisations of DPS. 
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A List of notation 


p 

X,Y 

E[.] 

O 

EC 

h 

9 

Q 

D 


Pa 

Pk 

P 

r 

Tfc 

Fk 

iBFS iDFS 
'•SGLj ''SGL 


iBFS ^FS 
^MGL) '-MGL 


iBFS ^FS 
^CB J ‘-CB 


iBFS iDFS 
^BG ’ ‘'BG 


iBFS ^FS 
^FG ’ ‘-FG 

(5n 

L{n, d) 


jFG^ jBG 


^n,tZ 

Sn 

Tn 

Un 

Tn 


4^n 


b 

e 


Probability 

Random variables 

Expectation of a random variable 

Big-0 notation 

Edge cost 

Heuristic function 

Accumulated path cost from start node 

Objective function 

Maximum depth of search space 

Goal probability at a single goal level g 

Goal probability for a level k 

Vector of probabilities for multiple goal levels 

Goal peak and goal spread in Gaussian binary tree 

Probability that a goal exists 

Probability that level k has a goal 

Probability that level k has the first goal 

Expected BES search time and approximate expected DES search 

time in a complete tree with a single goal level 

Expected BES search time and approximate expected DES search 

time in a complete tree with multiple goal levels 

Expected BES search time and approximate expected DES search 

time in a graph with colliding branches 

Expected BES search time and approximate expected DES search 
time in the binary grammar problem 

Expected BES search time and approximate expected DES search 

time in the full grammar problem 

The first node on level n reached by DES 

Descendant counter, counting the number of level d descendants 
reachable from 5n 

Descendant counters for the binary grammar problem and the full 
grammar problem 

Number of nodes reachable from not reachable from 5n+i 
Descendants of 5n 

Descendants of that are not descendants of Sn+i 

The number of nodes above level n. _ 

The probability that Tn contains a goal (Lemma 13) 

The probability that T„ inhabits the first goal 
Branching factor 
Empty string 
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B List of search problems 

The following is an incomplete list of problems naturally modelled as search 
problems, organised by type. 

• Puzzles 

— N-puzzle 

— Instant insanity (Knuth 1975) 

— Eternity II (Assembly puzzles) 

• Infinite 

— Grammar (aka Production system; simpler PSVN) 

— STRIPS planning (PDDL language) 

— Root-factorial (Knuth) 

• Real-world problems 

- MDP 

- SAT 

— VLSI chip design (cell layout, channel routing) 

— Robot navigation (continuous) 

— Route finding 
— Tour hnding 
— Assembly sequencing 
— Protein design 

— Jobshop (who does what task when) 

• Other 


— Towers of Hanoi 

— Cannibal-missionary 

— Sokoban 

— Rubik’s cube 

— Sudoku 

— Knight jumping 

— N-queens 

— Belief state (in a deterministic, partially observable world) 


— Quasigroup completion problem (Gomes et al. 1997), naturally CSP 

— Counterfeit coin problem ([Pearl 19881 
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C List of Topological features 


The neighbourhood relation N induces a topology on the state space S. The 
goal of this study is to closer investigate how topological features of the search 
graph affects search performance. Following is a list of potentially useful features, 
where +, ?, - ranks the features by a priori likelihood of being useful. Please refer 
to (Diestel 2006) for a standard reference on graph theory and for explanations 
and definitions of below terms. 


• Problem type 

+ Directed/undirected graph 

• State space 

+ number of nodes (finite or infinite) 

- number of edges 

• Graph structure 

- bipartite/fc-partite 

+ clique size distribution 

+ clique covering number (how many cliques are required to cover the 
graph?) 

- chordal (every cycle of length > 4 has a ’’chord”) 

- stability number (greatest number of non-connected nodes) 

+ degree (max/min/average) 

- min cycle (girth) 

- max cycle (circumference) 

- diameter = max^,,^ distance(x, y) 

- radius = min^; maxj, distance(x, y) [x is ‘the centre’ of the graph). 
The radius satisfies radius(G) < diameter(G) < 2 • radius(G). 

• Tree structure 

+ height 
+ max width 

+ width as a function of depth 

• Branching factor 

+ Distribution (possibly as a function of depth) 

+ Max 

+ Effective (when using heuristic) 

• Path length 

+ Max/average path length from origin, repeating allowed/disallowed 

• Path redundancy 

+ /c-connectedness (every node pair have at least k independent paths) 

- Z-edge-connectedness (no cutset of I edges) 

+ distribution of connectedness/revisiting frequency 
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Matrix properties 

+ Spectrum (Spectral Graph Theory) 
Edge space 


? Cyclomatic number (defined in (Diestel 2006 p. 24) as the dimension 


of the space formed by cycles - a subspace of the space of edges under 
” symmetric difference”) 

- Cut space dimension 

Many properties of the search graph can be estimated from local samples. 
For example, Wu and Preciado (2013) show that the spectrum of the graph (i.e., 


the eigenvalues of the matrix representation of the graph) can be estimated this 
way. 
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